The YouTube Social Network

نویسندگان

  • Mirjam Wattenhofer
  • Roger Wattenhofer
  • Zack Zhu
چکیده

Today, YouTube is the largest user-driven video content provider in the world; it has become a major platform for disseminating multimedia information. A major contribution to its success comes from the user-touser social experience that differentiates it from traditional content broadcasters. This work examines the social network aspect of YouTube by measuring the fullscale YouTube subscription graph, comment graph, and video content corpus. We find YouTube to deviate significantly from network characteristics that mark traditional online social networks, such as homophily, reciprocative linking, and assortativity. However, comparing to reported characteristics of another content-driven online social network, Twitter, YouTube is remarkably similar. Examining the social and content facets of user popularity, we find a stronger correlation between a user’s social popularity and his/her most popular content as opposed to typical content popularity. Finally, we demonstrate an application of our measurements for classifying YouTube Partners, who are selected users that share YouTube’s advertisement revenue. Results are motivating despite the highly imbalanced nature of the classification problem. Introduction YouTube is a key international platform for socially-enabled media diffusion. According to public statistics, more than 48 hours of video content is uploaded every minute and 3 billion views are generated every day. To complement the content broadcast/consume experience, YouTube connects seamlessly with major online social networks (OSNs) such as Facebook, Twitter, and Google+ to facilitate off-site diffusion. In fact, 12 million users have linked their YouTube account with at least one such OSN for auto-sharing, and more than 150 years of YouTube content is watched on Facebook every day. More importantly, YouTube serves as a popular social network on its own, connecting registered users through subscriptions that notify subscribers of social and content updates of the subscribed-to users. In this paper, we lever∗Zack Zhu conducted this research while visiting Google. He is currently in the Wearable Computing Laboratory at ETH Zurich. Copyright c © 2012, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. age full-scale data of the YouTube social network to answer practical questions from a graph theoretic point of view. We shed light on the following: 1. What can be observed from the complete YouTube social network topology? How does it compare with other social networks in terms of intrinsic properties and emergent observations? 2. On the YouTube platform, how do users connect and interact with each other? What is the relationship between the explicit and implicit social graphs describing the subscription and commenting relationships? 3. What constitutes popularity on YouTube? How does a user’s topological (social) popularity correlate with his/her content popularity? Our analyses will illustrate that YouTube deviates significantly from traditional OSN characteristics. However, it concurs with the observations of Kwak, Lee, Park, and Moon for the Twittersphere (2010). We will see a surprising dichotomy of content and social activities on the YouTube platform, indicating that YouTube is, distinctly, as much a social network as it is a content-diffusion platform. Finally, we note social popularity on YouTube correlates more with the maximum content popularity achieved as opposed to the summary measures of content popularity. These observations lead to the conjecture that a new class of social network is emerging, a type that facilitates indirect socialization via a gluing content layer in between directed users-to-user interaction. Potentially, a paradigm shift is taking place for OSNs such that what constitutes “social” now incorporates user-content-user interaction in addition to the traditional user-user interaction. Through intrinsically different linking and interacting characteristics, these contentdriven social networks create new social dynamics and necessitate further research to better understand their role in the processes of socialization and information diffusion. Background and Related Work Recent surge of OSN popularity has attracted the attention of researchers from a variety of fields. Here, the surveyed works are divided into two categories: OSN measurements and machine learning-based OSN applications. 354 Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media OSN Measurements Researchers have taken advantage of the YouTube Data API to measure a variety of metrics in relation to video popularity on YouTube. Studies by Cha et al. (2007), Benevenuto et al. (2009), and Cheng et al. (2008) all analyze this video corpus for the purposes of understanding video popularity. However, by measuring at the video-level, user-based social characteristics are largely omitted. We build on these works by aggregating YouTube’s video corpus metrics at the user level to complement content metrics with social topology. In this way, we are able to make the connection between video content popularity and corresponding social popularity. Even though these works make efforts to avoid sampling biases while accessing the data through a rate-limiting YouTube API and/or online crawlers, the datasets collected are only fractions of the entire corpus. In this work, we obtain data from within YouTube to allow complete measurements. Researchers Mislove, Marcon, Gummadi, Druschel, and Bhattacharjee (2007), Krishnamurthy et al. (2008), and Kwak et al. (2010), have reported measurements of various major online social networks. In the work of Mislove et al. (2007), an array of graph-based measurements are presented for multiple popular online social networks, including YouTube. They present a framework of measurements that we adopt here for ease of comparison. In their measurement methodology, a mix of API and HTML scraping techniques are used to obtain a sampled version of the social graph. However, as the authors themselves pointed out, their methodology is limited when trying to extrapolate observations to the entire YouTube population. This work addresses this concern and compares our results where appropriate. In Krishnamurthy et al. (2008), the authors examine topological features of a sampled Twitter network as well as content uploaded from users. This work is similar to what we present for YouTube as we analyze measurements from two social graphs and the video corpus. Recently, Kwak et al. (2010) conducts one of the first full crawls of a major online social network by measuring the entire Twittersphere. The size and coverage of their dataset is comparable to what we present here. OSN Applications In terms of user classification, De Choudhury et al. (2010) proposes threshold networks with non-arbitrary thresholds for increased accuracy in both link prediction and user classification. In this work, the idea of thresholding to prune real-world datasets is used to illustrate an interesting relationship between explicit and implicit social relationships. Hong et al. (2011) leverage network characteristics to successfully predict popular messages and Bakshy et al. (2011) classify “influential” users according to re-tweet quantities. A key similarity between these works is their use of various topological metrics calculated from the social graph. In our work, such features are utilized as well in our classification application. On top of the explicit social graph, topology measures of an implicit social graph and aggregated userlevel metrics from the video corpus are used as well. Table 1: Nodal feature descriptions Name Description user Encrypted user id sub.out Out degree on subscription graph sub.in In degree on subscription graph avg.pub.out Average out degree of users subscribed to avg.pub.in Average in degree of users subscribed to avg.sub.out Average out degree of subscribers avg.sub.in Average in degree of subscribers reciprocal # of reciprocal links on subscription graph sub.pagerank PageRank of the subscription graph com.in In degree on the comment graph com.out Out degree on the comment graph com.pagerank PageRank on the comment graph max.fav Max # of times any video is favourited med.fav Median # of times any video is favourited min.fav Min # of times any video is favourited max.views Max # of times any video is viewed med.views Median # of times any video is viewed min.views Min # of times any video is viewed max.coms Max # of comments any video received med.coms Median # of comments any video received min.coms Min # of comments any video received max.raters Max # of raters for any video med.raters Median # of raters for any video min.raters Min # of raters for any video max.avg.rating Max of average ratings for any video med.avg.rating Med of average ratings for any video min.avg.ratings Min of average ratings for any video main.cat Category that most videos are uploaded in uploads Number of videos posted Measuring All of YouTube As mentioned in the previous section, a multitude of work has sampled various major OSNs through online crawls and/or API usage. However, few measurement projects have captured whole social graphs without compromise. This work leverages the data and computing power available within Google to shed light on a major social platform. The data collection process mainly utilizes MapReduce (Dean and Ghemawat 2008) and Pregel (Malewicz et al. 2010), a large-scale proprietary graph computing framework, to leverage Google’s computing resources. Therefore, the runtime to capture entire datasets can be completed in tens of minutes, capturing and processing complete social graphs on the YouTube social network. We base our analyses of the YouTube social network on three main corpora of data: the explicit social graph depicting subscriptions, the implicit social graph depicting commenting activities, and aggregated metrics of user-uploaded content. These datasets were captured in August 2011. We removed axis labels on our plots to preserve data confidentiality. We compose a directed graph to represent the subscription relationships of registered YouTube users. Each node represents one such user while a link points from a subscriber to the user subscribed-to. Therefore, this graph is composed of registered users who have subscribed to at least one user or received at least one subscription. Similarly, the comment 355 graph is composed of users who have posted or received at least one comment. Again, links point from the commenter to the comment-receiving user. Both graphs contain nodes on the order of hundreds of millions and links on the order of billions, comparable to the measurement size of Kwak et al. (2010) for Twitter. The third corpus of data is an aggregation of YouTube video metrics to the uploader level. For each uploader μ ∈ N who has uploaded at least one video, we can construct a video vector vμ. Then, for each of our video-level metrics (e.g. view count, number of video comments, etc.) we aggregate by taking the minimum, median, and maximum of vμ. For example, a specific user’s median average rating refers to the scalar mμ = median(v μ, v 2 μ, ..., v n μ), where each v denotes the average rating for the n video that user μ has uploaded. Table 1 presents the user features accumulated from the three datasets. The naming convention in Table 1 will be referred to consistently from here on. Degree Distribution of Social Graphs Starting off with basic degree distribution analysis of both social graphs, Figure 1 plots the log-log complementary cumulative distribution function (CCDF). As done in most OSN studies, the degree distributions are typically generated as log-log CCDF to better illustrate the tail behaviour on both ends. To interpret the plots, it can be understood, for an (x, y) pair, a fraction y of the population has a degree more than x. Noticeably, there is a sharp kink in the subscription out-degree curve at x = x∗. This is an artifact of the YouTube subscription rule that limits the number of subscriptions for users who do not have a significant number of subscribers themselves. In both graphs, there exists extremely popular users who have millions of subscribers and/or commenters. However, about half of the sampled population has one or zero subscriber and/or commenter. In the comment graph, the out-degree distribution also does not fit the power-law signature. Fitting the power-law distribution (via maximum likelihood estimation) to the in-degree of the subscription graph and the comment graph, scaling exponents of 1.55 and 1.44 are found, respectively. These exponents differ from the majority of real-world social networks, which have been measured between 2 and 3 (Kwak et al. 2010) as well as 1.99 (Mislove et al. 2007) for a sampled YouTube subscription graph. A Content-Driven OSN Traditionally, researchers have modelled social networks, online or offline, as undirected links between users. Intrinsically different for YouTube, the de facto mode of linking is through directed subscription links. Furthermore, user interaction is largely through uploaded video content. For example, as opposed to interacting directly (e.g. wall posts, direct messages), much of the interaction on YouTube takes place in a video-centered manner, such as rating another’s video or leaving a video comment. Therefore, user interaction becomes very much a user-content-user relationship where users interact with each other through a gluing layer of uploaded content. These two inherent characteristics of Degree Distributions Degrees C C D F ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●● ● ●●●● ●●●● ●●●●●●●●●●●●●●●●● ●● ●●●● ●●●●●● ●● ●● ●●●● ●●●●●● ●● ●● ●● ●● ● ● ● ● ●●●● ●● ●● ●●●● ●● ●●●● ●●●● ●● ●●●● ●●●● ● ●●●●● ●● ●● ● ●● ●● ● ● ●●●●●●●● ● ● ● ● ● ● ● ● ● ● ● In degree Out degree

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Similarity and Ties in Social Networks: a Study of the YouTube Social Network

Social networks and the propagation of content within social networks have received an extensive attention during the past few years. Social network content propagation is believed to depend on the similarity of users as well as on the existence of friends in the social network. Our former investigation of the YouTube social network showed that strangers (non-friends and non-followers) play a m...

متن کامل

Word of Mouth Dynamics in Online Social Networks: Investigating Social Influence Cascades on YouTube

Introduction and Research Question With its user-friendly interface and the growth in popularity of online video, YouTube has catapulted to a dominant position on the Internet. While this model is extremely attractive for marketers and content creators, recent work has recognized the relatively ephemeral nature of popularity of videos on YouTube, where only a tiny fraction of videos managed att...

متن کامل

Increasing Policy Success through the Use of Social Media Cross-Channels for Citizen Political Engagement

In the ubiquitous digitization era, governments increasingly adopt multi-social media channels for the purpose of facilitating citizen engagement towards enhanced government transparency, external political efficacy and policy success. However, little is known about the use of social media cross-channel information-sharing mechanisms for promoting citizen political engagement. We draw on theori...

متن کامل

Virality over YouTube: an empirical analysis

Purpose: The purpose of this research is to seek reasons for some videos going viral over YouTube (a type of social media platform). Methodology:Using YouTube APIs (Application Programming Interface) and Webometrics analyst tool, we collected data on about 100 all-time-most-viewed YouTube videos and information about the users associated with the videos. We constructed and tested an empirical m...

متن کامل

Understanding the Characteristics of Internet Short Video Sharing: YouTube as a Case Study

Established in 2005, YouTube has become the most successful Internet site providing a new generation of short video sharing service. Today, YouTube alone comprises approximately 20% of all HTTP traffic, or nearly 10% of all traffic on the Internet. Understanding the features of YouTube and similar video sharing sites is thus crucial to their sustainable development and to network traffic engine...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012